Overview

Dataset statistics

Number of variables28
Number of observations13510
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.8 MiB
Average record size in memory217.0 B

Variable types

Numeric12
Categorical15
Boolean1

Alerts

city has a high cardinality: 177 distinct valuesHigh cardinality
fireplace has a high cardinality: 71 distinct valuesHigh cardinality
parking has a high cardinality: 272 distinct valuesHigh cardinality
sewer has a high cardinality: 88 distinct valuesHigh cardinality
water has a high cardinality: 74 distinct valuesHigh cardinality
app has a high cardinality: 145 distinct valuesHigh cardinality
heating has a high cardinality: 90 distinct valuesHigh cardinality
cooling has a high cardinality: 116 distinct valuesHigh cardinality
materials has a high cardinality: 161 distinct valuesHigh cardinality
roof has a high cardinality: 286 distinct valuesHigh cardinality
interior has a high cardinality: 448 distinct valuesHigh cardinality
price is highly overall correlated with bath and 1 other fieldsHigh correlation
lat is highly overall correlated with stateHigh correlation
long is highly overall correlated with state and 1 other fieldsHigh correlation
bath is highly overall correlated with price and 2 other fieldsHigh correlation
bed is highly overall correlated with bath and 1 other fieldsHigh correlation
living is highly overall correlated with price and 2 other fieldsHigh correlation
covered is highly overall correlated with garage and 1 other fieldsHigh correlation
garage is highly overall correlated with covered and 1 other fieldsHigh correlation
total_spaces is highly overall correlated with covered and 1 other fieldsHigh correlation
add_attr is highly overall correlated with state and 2 other fieldsHigh correlation
state is highly overall correlated with lat and 2 other fieldsHigh correlation
sewer is highly overall correlated with add_attrHigh correlation
water is highly overall correlated with long and 1 other fieldsHigh correlation
status is highly imbalanced (67.3%)Imbalance
fireplace is highly imbalanced (60.8%)Imbalance
subtype is highly imbalanced (74.7%)Imbalance
sewer is highly imbalanced (63.3%)Imbalance
water is highly imbalanced (57.5%)Imbalance
heating is highly imbalanced (52.4%)Imbalance
roof is highly imbalanced (60.7%)Imbalance
foundation is highly imbalanced (53.3%)Imbalance
covered is highly skewed (γ1 = 106.680655)Skewed
garage is highly skewed (γ1 = 87.84361272)Skewed
total_spaces is highly skewed (γ1 = 64.10821512)Skewed
year is highly skewed (γ1 = 34.18662349)Skewed

Reproduction

Analysis started2023-02-11 18:47:47.580947
Analysis finished2023-02-11 18:48:41.999115
Duration54.42 seconds
Software versionydata-profiling v0.0.dev0
Download configurationconfig.json

Variables

price
Real number (ℝ)

Distinct2726
Distinct (%)20.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1242986.5
Minimum0
Maximum1.65 × 108
Zeros2
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size105.7 KiB

Quantile statistics

Minimum0
5-th percentile168500
Q1309900
median510000
Q3980000
95-th percentile3590947.7
Maximum1.65 × 108
Range1.65 × 108
Interquartile range (IQR)670100

Descriptive statistics

Standard deviation4077021.9
Coefficient of variation (CV)3.280021
Kurtosis510.66836
Mean1242986.5
Median Absolute Deviation (MAD)245075
Skewness18.067013
Sum1.6792748 × 1010
Variance1.6622108 × 1013
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
350000 119
 
0.9%
299900 86
 
0.6%
325000 85
 
0.6%
650000 81
 
0.6%
399900 80
 
0.6%
375000 79
 
0.6%
450000 78
 
0.6%
399000 74
 
0.5%
300000 74
 
0.5%
275000 73
 
0.5%
Other values (2716) 12681
93.9%
ValueCountFrequency (%)
0 2
< 0.1%
700 1
< 0.1%
19900 1
< 0.1%
30000 2
< 0.1%
32900 1
< 0.1%
35000 1
< 0.1%
35700 1
< 0.1%
38900 1
< 0.1%
39900 1
< 0.1%
40000 1
< 0.1%
ValueCountFrequency (%)
165000000 1
< 0.1%
150000000 1
< 0.1%
139000000 1
< 0.1%
87000000 1
< 0.1%
85000000 1
< 0.1%
77000000 1
< 0.1%
69000000 1
< 0.1%
65000000 2
< 0.1%
63000000 1
< 0.1%
59995000 1
< 0.1%

status
Categorical

Distinct9
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size105.7 KiB
House for sale
10734 
Active
1852 
New construction
 
579
New
 
170
Foreclosure
 
68
Other values (4)
 
107

Length

Max length16
Median length14
Mean length12.813027
Min length3

Characters and Unicode

Total characters173104
Distinct characters27
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowHouse for sale
2nd rowHouse for sale
3rd rowHouse for sale
4th rowHouse for sale
5th rowHouse for sale

Common Values

ValueCountFrequency (%)
House for sale 10734
79.5%
Active 1852
 
13.7%
New construction 579
 
4.3%
New 170
 
1.3%
Foreclosure 68
 
0.5%
Price Change 48
 
0.4%
Coming soon 45
 
0.3%
Auction 9
 
0.1%
Re-activated 5
 
< 0.1%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
house 10734
30.1%
for 10734
30.1%
sale 10734
30.1%
active 1852
 
5.2%
new 749
 
2.1%
construction 579
 
1.6%
foreclosure 68
 
0.2%
price 48
 
0.1%
change 48
 
0.1%
coming 45
 
0.1%
Other values (3) 59
 
0.2%

Most occurring characters

ValueCountFrequency (%)
e 24311
14.0%
o 22906
13.2%
s 22160
12.8%
22140
12.8%
r 11497
6.6%
u 11390
6.6%
l 10802
6.2%
a 10792
6.2%
H 10734
6.2%
f 10734
6.2%
Other values (17) 15638
9.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 137401
79.4%
Space Separator 22140
 
12.8%
Uppercase Letter 13558
 
7.8%
Dash Punctuation 5
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 24311
17.7%
o 22906
16.7%
s 22160
16.1%
r 11497
8.4%
u 11390
8.3%
l 10802
7.9%
a 10792
7.9%
f 10734
7.8%
c 3140
 
2.3%
t 3029
 
2.2%
Other values (8) 6640
 
4.8%
Uppercase Letter
ValueCountFrequency (%)
H 10734
79.2%
A 1861
 
13.7%
N 749
 
5.5%
C 93
 
0.7%
F 68
 
0.5%
P 48
 
0.4%
R 5
 
< 0.1%
Space Separator
ValueCountFrequency (%)
22140
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 150959
87.2%
Common 22145
 
12.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 24311
16.1%
o 22906
15.2%
s 22160
14.7%
r 11497
7.6%
u 11390
7.5%
l 10802
7.2%
a 10792
7.1%
H 10734
7.1%
f 10734
7.1%
c 3140
 
2.1%
Other values (15) 12493
8.3%
Common
ValueCountFrequency (%)
22140
> 99.9%
- 5
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 173104
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 24311
14.0%
o 22906
13.2%
s 22160
12.8%
22140
12.8%
r 11497
6.6%
u 11390
6.6%
l 10802
6.2%
a 10792
6.2%
H 10734
6.2%
f 10734
6.2%
Other values (17) 15638
9.0%

add_attr
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size13.3 KiB
True
10330 
False
3180 
ValueCountFrequency (%)
True 10330
76.5%
False 3180
 
23.5%

city
Categorical

Distinct177
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Memory size105.7 KiB
Dallas
 
764
Philadelphia
 
751
Chicago
 
751
Jacksonville
 
749
Indianapolis
 
747
Other values (172)
9748 

Length

Max length20
Median length18
Mean length9.0812731
Min length4

Characters and Unicode

Total characters122688
Distinct characters50
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique43 ?
Unique (%)0.3%

Sample

1st rowSouth Ozone Park
2nd rowJamaica
3rd rowStaten Island
4th rowFlushing
5th rowBrooklyn

Common Values

ValueCountFrequency (%)
Dallas 764
 
5.7%
Philadelphia 751
 
5.6%
Chicago 751
 
5.6%
Jacksonville 749
 
5.5%
Indianapolis 747
 
5.5%
San Antonio 720
 
5.3%
Charlotte 718
 
5.3%
Fort Worth 714
 
5.3%
Houston 708
 
5.2%
Columbus 687
 
5.1%
Other values (167) 6201
45.9%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
san 2046
 
11.8%
dallas 764
 
4.4%
jacksonville 753
 
4.3%
philadelphia 751
 
4.3%
chicago 751
 
4.3%
indianapolis 747
 
4.3%
antonio 720
 
4.1%
charlotte 718
 
4.1%
fort 714
 
4.1%
worth 714
 
4.1%
Other values (193) 8725
50.1%

Most occurring characters

ValueCountFrequency (%)
a 11990
 
9.8%
o 11717
 
9.6%
n 10731
 
8.7%
l 9795
 
8.0%
i 8923
 
7.3%
e 8099
 
6.6%
s 6932
 
5.7%
t 6587
 
5.4%
h 5431
 
4.4%
r 3919
 
3.2%
Other values (40) 38564
31.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 101372
82.6%
Uppercase Letter 17423
 
14.2%
Space Separator 3893
 
3.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 11990
11.8%
o 11717
11.6%
n 10731
10.6%
l 9795
9.7%
i 8923
8.8%
e 8099
8.0%
s 6932
6.8%
t 6587
 
6.5%
h 5431
 
5.4%
r 3919
 
3.9%
Other values (15) 17248
17.0%
Uppercase Letter
ValueCountFrequency (%)
S 2688
15.4%
C 2283
13.1%
D 2064
11.8%
P 1551
8.9%
A 1487
8.5%
J 1174
6.7%
F 1157
6.6%
W 1132
6.5%
H 997
 
5.7%
I 944
 
5.4%
Other values (14) 1946
11.2%
Space Separator
ValueCountFrequency (%)
3893
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 118795
96.8%
Common 3893
 
3.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 11990
 
10.1%
o 11717
 
9.9%
n 10731
 
9.0%
l 9795
 
8.2%
i 8923
 
7.5%
e 8099
 
6.8%
s 6932
 
5.8%
t 6587
 
5.5%
h 5431
 
4.6%
r 3919
 
3.3%
Other values (39) 34671
29.2%
Common
ValueCountFrequency (%)
3893
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 122688
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 11990
 
9.8%
o 11717
 
9.6%
n 10731
 
8.7%
l 9795
 
8.0%
i 8923
 
7.3%
e 8099
 
6.6%
s 6932
 
5.7%
t 6587
 
5.4%
h 5431
 
4.4%
r 3919
 
3.2%
Other values (40) 38564
31.4%

state
Categorical

Distinct14
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size105.7 KiB
TX
3437 
CA
2152 
AZ
772 
OH
766 
FL
756 
Other values (9)
5627 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters27020
Distinct characters15
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNY
2nd rowNY
3rd rowNY
4th rowNY
5th rowNY

Common Values

ValueCountFrequency (%)
TX 3437
25.4%
CA 2152
15.9%
AZ 772
 
5.7%
OH 766
 
5.7%
FL 756
 
5.6%
IN 754
 
5.6%
IL 752
 
5.6%
PA 751
 
5.6%
NC 730
 
5.4%
NY 721
 
5.3%
Other values (4) 1919
14.2%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
tx 3437
25.4%
ca 2152
15.9%
az 772
 
5.7%
oh 766
 
5.7%
fl 756
 
5.6%
in 754
 
5.6%
il 752
 
5.6%
pa 751
 
5.6%
nc 730
 
5.4%
ny 721
 
5.3%
Other values (4) 1919
14.2%

Most occurring characters

ValueCountFrequency (%)
T 4125
15.3%
A 3995
14.8%
C 3793
14.0%
X 3437
12.7%
N 2893
10.7%
L 1508
 
5.6%
I 1506
 
5.6%
O 1365
 
5.1%
Z 772
 
2.9%
H 766
 
2.8%
Other values (5) 2860
10.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 27020
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 4125
15.3%
A 3995
14.8%
C 3793
14.0%
X 3437
12.7%
N 2893
10.7%
L 1508
 
5.6%
I 1506
 
5.6%
O 1365
 
5.1%
Z 772
 
2.9%
H 766
 
2.8%
Other values (5) 2860
10.6%

Most occurring scripts

ValueCountFrequency (%)
Latin 27020
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 4125
15.3%
A 3995
14.8%
C 3793
14.0%
X 3437
12.7%
N 2893
10.7%
L 1508
 
5.6%
I 1506
 
5.6%
O 1365
 
5.1%
Z 772
 
2.9%
H 766
 
2.8%
Other values (5) 2860
10.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 27020
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
T 4125
15.3%
A 3995
14.8%
C 3793
14.0%
X 3437
12.7%
N 2893
10.7%
L 1508
 
5.6%
I 1506
 
5.6%
O 1365
 
5.1%
Z 772
 
2.9%
H 766
 
2.8%
Other values (5) 2860
10.6%

lat
Real number (ℝ)

Distinct13383
Distinct (%)99.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.791818
Minimum0
Maximum47.733967
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size105.7 KiB

Quantile statistics

Minimum0
5-th percentile29.609097
Q132.743026
median35.164196
Q339.885906
95-th percentile41.906206
Maximum47.733967
Range47.733967
Interquartile range (IQR)7.1428803

Descriptive statistics

Standard deviation4.4114308
Coefficient of variation (CV)0.12325249
Kurtosis-0.33799972
Mean35.791818
Median Absolute Deviation (MAD)4.587078
Skewness0.28773032
Sum483547.46
Variance19.460721
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
40.015083 3
 
< 0.1%
39.95425 2
 
< 0.1%
32.97955 2
 
< 0.1%
41.95046 2
 
< 0.1%
36.124767 2
 
< 0.1%
34.040268 2
 
< 0.1%
39.94218 2
 
< 0.1%
37.73284 2
 
< 0.1%
39.734142 2
 
< 0.1%
39.76294 2
 
< 0.1%
Other values (13373) 13489
99.8%
ValueCountFrequency (%)
0 1
< 0.1%
29.138699 1
< 0.1%
29.1487 1
< 0.1%
29.166689 1
< 0.1%
29.198627 1
< 0.1%
29.245447 1
< 0.1%
29.268686 1
< 0.1%
29.270285 1
< 0.1%
29.275421 1
< 0.1%
29.28695 1
< 0.1%
ValueCountFrequency (%)
47.733967 1
< 0.1%
47.733162 1
< 0.1%
47.732807 1
< 0.1%
47.731823 1
< 0.1%
47.73067 1
< 0.1%
47.72927 1
< 0.1%
47.72821 1
< 0.1%
47.72813 1
< 0.1%
47.72619 1
< 0.1%
47.7254 1
< 0.1%

long
Real number (ℝ)

Distinct13341
Distinct (%)98.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-95.262188
Minimum-122.50857
Maximum0
Zeros1
Zeros (%)< 0.1%
Negative13509
Negative (%)> 99.9%
Memory size105.7 KiB

Quantile statistics

Minimum-122.50857
5-th percentile-121.94487
Q1-105.02284
median-95.506082
Q3-82.921524
95-th percentile-74.160715
Maximum0
Range122.50857
Interquartile range (IQR)22.101311

Descriptive statistics

Standard deviation15.052261
Coefficient of variation (CV)-0.15800877
Kurtosis-0.92719424
Mean-95.262188
Median Absolute Deviation (MAD)12.542309
Skewness-0.38028838
Sum-1286992.2
Variance226.57056
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-86.00916 2
 
< 0.1%
-83.07756 2
 
< 0.1%
-87.72052 2
 
< 0.1%
-117.143074 2
 
< 0.1%
-80.83985 2
 
< 0.1%
-97.333595 2
 
< 0.1%
-73.85052 2
 
< 0.1%
-95.50236 2
 
< 0.1%
-104.96717 2
 
< 0.1%
-86.66064 2
 
< 0.1%
Other values (13331) 13490
99.9%
ValueCountFrequency (%)
-122.508575 1
< 0.1%
-122.50841 1
< 0.1%
-122.50772 1
< 0.1%
-122.50759 1
< 0.1%
-122.50677 1
< 0.1%
-122.50636 1
< 0.1%
-122.50594 1
< 0.1%
-122.505615 1
< 0.1%
-122.50462 1
< 0.1%
-122.50259 1
< 0.1%
ValueCountFrequency (%)
0 1
< 0.1%
-73.702156 1
< 0.1%
-73.7041 1
< 0.1%
-73.704544 1
< 0.1%
-73.70573 1
< 0.1%
-73.70951 1
< 0.1%
-73.71235 1
< 0.1%
-73.7136 1
< 0.1%
-73.7185 1
< 0.1%
-73.719795 1
< 0.1%

bath
Real number (ℝ)

Distinct23
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.8907476
Minimum1
Maximum27
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size105.7 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q33
95-th percentile6
Maximum27
Range26
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.6398857
Coefficient of variation (CV)0.56728776
Kurtosis19.134064
Mean2.8907476
Median Absolute Deviation (MAD)1
Skewness3.0045684
Sum39054
Variance2.6892252
MonotonicityNot monotonic
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
2 5124
37.9%
3 3935
29.1%
4 1650
 
12.2%
1 1424
 
10.5%
5 590
 
4.4%
6 308
 
2.3%
7 195
 
1.4%
8 110
 
0.8%
9 65
 
0.5%
10 35
 
0.3%
Other values (13) 74
 
0.5%
ValueCountFrequency (%)
1 1424
 
10.5%
2 5124
37.9%
3 3935
29.1%
4 1650
 
12.2%
5 590
 
4.4%
6 308
 
2.3%
7 195
 
1.4%
8 110
 
0.8%
9 65
 
0.5%
10 35
 
0.3%
ValueCountFrequency (%)
27 1
 
< 0.1%
25 1
 
< 0.1%
24 1
 
< 0.1%
20 1
 
< 0.1%
19 1
 
< 0.1%
18 3
 
< 0.1%
17 3
 
< 0.1%
16 2
 
< 0.1%
15 4
< 0.1%
14 8
0.1%

bed
Real number (ℝ)

Distinct14
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.6632124
Minimum1
Maximum17
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size105.7 KiB

Quantile statistics

Minimum1
5-th percentile2
Q13
median3
Q34
95-th percentile6
Maximum17
Range16
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.1093722
Coefficient of variation (CV)0.30284135
Kurtosis7.1839472
Mean3.6632124
Median Absolute Deviation (MAD)1
Skewness1.5120431
Sum49490
Variance1.2307067
MonotonicityNot monotonic
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
3 5600
41.5%
4 4441
32.9%
5 1586
 
11.7%
2 1089
 
8.1%
6 485
 
3.6%
7 135
 
1.0%
1 76
 
0.6%
8 55
 
0.4%
9 19
 
0.1%
10 9
 
0.1%
Other values (4) 15
 
0.1%
ValueCountFrequency (%)
1 76
 
0.6%
2 1089
 
8.1%
3 5600
41.5%
4 4441
32.9%
5 1586
 
11.7%
6 485
 
3.6%
7 135
 
1.0%
8 55
 
0.4%
9 19
 
0.1%
10 9
 
0.1%
ValueCountFrequency (%)
17 1
 
< 0.1%
14 3
 
< 0.1%
12 6
 
< 0.1%
11 5
 
< 0.1%
10 9
 
0.1%
9 19
 
0.1%
8 55
 
0.4%
7 135
 
1.0%
6 485
 
3.6%
5 1586
11.7%

living
Real number (ℝ)

Distinct3983
Distinct (%)29.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2449.4537
Minimum1
Maximum56500
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size105.7 KiB

Quantile statistics

Minimum1
5-th percentile983
Q11440
median1986.5
Q32766
95-th percentile5350.65
Maximum56500
Range56499
Interquartile range (IQR)1326

Descriptive statistics

Standard deviation1975.3209
Coefficient of variation (CV)0.80643323
Kurtosis89.682313
Mean2449.4537
Median Absolute Deviation (MAD)617.5
Skewness6.4488085
Sum33092120
Variance3901892.7
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1200 61
 
0.5%
1800 43
 
0.3%
2000 41
 
0.3%
3000 39
 
0.3%
1600 38
 
0.3%
1500 37
 
0.3%
1440 37
 
0.3%
2200 35
 
0.3%
2100 34
 
0.3%
2400 33
 
0.2%
Other values (3973) 13112
97.1%
ValueCountFrequency (%)
1 2
< 0.1%
300 1
 
< 0.1%
311 1
 
< 0.1%
392 1
 
< 0.1%
400 1
 
< 0.1%
480 1
 
< 0.1%
490 1
 
< 0.1%
500 1
 
< 0.1%
504 3
< 0.1%
512 1
 
< 0.1%
ValueCountFrequency (%)
56500 1
< 0.1%
38000 1
< 0.1%
37132 1
< 0.1%
36000 1
< 0.1%
31450 1
< 0.1%
30000 1
< 0.1%
29108 1
< 0.1%
28720 1
< 0.1%
25000 1
< 0.1%
22897 1
< 0.1%

lot_a
Real number (ℝ)

Distinct5513
Distinct (%)40.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9614.1097
Minimum3
Maximum129373.2
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size105.7 KiB

Quantile statistics

Minimum3
5-th percentile1829.736
Q14617.36
median7012.08
Q39583.2
95-th percentile26136
Maximum129373.2
Range129370.2
Interquartile range (IQR)4965.84

Descriptive statistics

Standard deviation11546.123
Coefficient of variation (CV)1.2009561
Kurtosis31.510753
Mean9614.1097
Median Absolute Deviation (MAD)2531.3264
Skewness4.8914119
Sum1.2988662 × 108
Variance1.3331296 × 108
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7405.2 143
 
1.1%
6534 140
 
1.0%
6098.4 136
 
1.0%
4791.6 131
 
1.0%
5227.2 125
 
0.9%
7840.8 121
 
0.9%
4356 115
 
0.9%
6969.6 110
 
0.8%
8276.4 109
 
0.8%
10018.8 109
 
0.8%
Other values (5503) 12271
90.8%
ValueCountFrequency (%)
3 3
< 0.1%
3.03 1
 
< 0.1%
3.0344 1
 
< 0.1%
3.1 1
 
< 0.1%
3.11 1
 
< 0.1%
3.125 1
 
< 0.1%
3.16 2
< 0.1%
3.19 1
 
< 0.1%
3.1975 1
 
< 0.1%
3.2 1
 
< 0.1%
ValueCountFrequency (%)
129373.2 1
< 0.1%
128850.48 1
< 0.1%
128502 1
< 0.1%
126324 1
< 0.1%
124973.64 1
< 0.1%
123710.4 1
< 0.1%
122564.772 1
< 0.1%
121968 1
< 0.1%
119999.088 1
< 0.1%
119790 1
< 0.1%

fireplace
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct71
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size105.7 KiB
none
6106 
1.0
4303 
2.0
741 
living room
 
409
not applicable
 
370
Other values (66)
1581 

Length

Max length26
Median length25
Mean length4.5897113
Min length3

Characters and Unicode

Total characters62007
Distinct characters38
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique22 ?
Unique (%)0.2%

Sample

1st rownone
2nd rownone
3rd rownone
4th rownone
5th rownone

Common Values

ValueCountFrequency (%)
none 6106
45.2%
1.0 4303
31.9%
2.0 741
 
5.5%
living room 409
 
3.0%
not applicable 370
 
2.7%
family room 318
 
2.4%
1 fireplace 253
 
1.9%
3.0 229
 
1.7%
gas log 97
 
0.7%
4.0 84
 
0.6%
Other values (61) 600
 
4.4%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
none 6106
39.9%
1.0 4303
28.1%
room 853
 
5.6%
2.0 741
 
4.8%
living 409
 
2.7%
not 370
 
2.4%
applicable 370
 
2.4%
family 320
 
2.1%
fireplace 295
 
1.9%
1 253
 
1.7%
Other values (71) 1294
 
8.4%

Most occurring characters

ValueCountFrequency (%)
n 13245
21.4%
o 8533
13.8%
e 7536
12.2%
0 5451
8.8%
. 5450
8.8%
1 4558
 
7.4%
i 2072
 
3.3%
l 1901
 
3.1%
1804
 
2.9%
a 1676
 
2.7%
Other values (28) 9781
15.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 43551
70.2%
Decimal Number 11190
 
18.0%
Other Punctuation 5458
 
8.8%
Space Separator 1804
 
2.9%
Math Symbol 2
 
< 0.1%
Open Punctuation 1
 
< 0.1%
Close Punctuation 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 13245
30.4%
o 8533
19.6%
e 7536
17.3%
i 2072
 
4.8%
l 1901
 
4.4%
a 1676
 
3.8%
r 1544
 
3.5%
m 1251
 
2.9%
p 1092
 
2.5%
g 818
 
1.9%
Other values (12) 3883
 
8.9%
Decimal Number
ValueCountFrequency (%)
0 5451
48.7%
1 4558
40.7%
2 776
 
6.9%
3 230
 
2.1%
4 84
 
0.8%
5 55
 
0.5%
6 16
 
0.1%
7 13
 
0.1%
8 5
 
< 0.1%
9 2
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
. 5450
99.9%
/ 8
 
0.1%
Space Separator
ValueCountFrequency (%)
1804
100.0%
Math Symbol
ValueCountFrequency (%)
+ 2
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 43551
70.2%
Common 18456
29.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 13245
30.4%
o 8533
19.6%
e 7536
17.3%
i 2072
 
4.8%
l 1901
 
4.4%
a 1676
 
3.8%
r 1544
 
3.5%
m 1251
 
2.9%
p 1092
 
2.5%
g 818
 
1.9%
Other values (12) 3883
 
8.9%
Common
ValueCountFrequency (%)
0 5451
29.5%
. 5450
29.5%
1 4558
24.7%
1804
 
9.8%
2 776
 
4.2%
3 230
 
1.2%
4 84
 
0.5%
5 55
 
0.3%
6 16
 
0.1%
7 13
 
0.1%
Other values (6) 19
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 62007
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 13245
21.4%
o 8533
13.8%
e 7536
12.2%
0 5451
8.8%
. 5450
8.8%
1 4558
 
7.4%
i 2072
 
3.3%
l 1901
 
3.1%
1804
 
2.9%
a 1676
 
2.7%
Other values (28) 9781
15.8%

parking
Categorical

Distinct272
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size105.7 KiB
gasrage
1419 
none
 
914
attache
 
684
attached gasrag
 
567
drivewa
 
539
Other values (267)
9387 

Length

Max length36
Median length27
Mean length11.005774
Min length3

Characters and Unicode

Total characters148688
Distinct characters38
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique46 ?
Unique (%)0.3%

Sample

1st rowgasrage - detached
2nd rownone
3rd rowdentache
4th rownone
5th rowshared drivewa

Common Values

ValueCountFrequency (%)
gasrage 1419
 
10.5%
none 914
 
6.8%
attache 684
 
5.1%
attached gasrag 567
 
4.2%
drivewa 539
 
4.0%
attached 506
 
3.7%
gasrage - attached 490
 
3.6%
on street 407
 
3.0%
driveway 402
 
3.0%
2-car single door 385
 
2.8%
Other values (262) 7197
53.3%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
gasrage 3814
 
15.4%
attached 1960
 
7.9%
gasrag 1959
 
7.9%
car 1256
 
5.1%
attache 1095
 
4.4%
none 962
 
3.9%
893
 
3.6%
door 853
 
3.5%
2 744
 
3.0%
2-car 712
 
2.9%
Other values (189) 10475
42.4%

Most occurring characters

ValueCountFrequency (%)
a 24741
16.6%
e 16753
11.3%
r 14288
9.6%
g 12663
8.5%
11215
7.5%
t 10704
 
7.2%
s 8784
 
5.9%
c 8700
 
5.9%
o 6824
 
4.6%
d 6253
 
4.2%
Other values (28) 27763
18.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 133242
89.6%
Space Separator 11215
 
7.5%
Decimal Number 1835
 
1.2%
Dash Punctuation 1724
 
1.2%
Other Punctuation 595
 
0.4%
Math Symbol 65
 
< 0.1%
Open Punctuation 10
 
< 0.1%
Close Punctuation 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 24741
18.6%
e 16753
12.6%
r 14288
10.7%
g 12663
9.5%
t 10704
8.0%
s 8784
 
6.6%
c 8700
 
6.5%
o 6824
 
5.1%
d 6253
 
4.7%
n 6002
 
4.5%
Other values (15) 17530
13.2%
Decimal Number
ValueCountFrequency (%)
2 1458
79.5%
1 263
 
14.3%
3 82
 
4.5%
4 31
 
1.7%
5 1
 
0.1%
Other Punctuation
ValueCountFrequency (%)
: 547
91.9%
/ 41
 
6.9%
& 7
 
1.2%
Space Separator
ValueCountFrequency (%)
11215
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1724
100.0%
Math Symbol
ValueCountFrequency (%)
+ 65
100.0%
Open Punctuation
ValueCountFrequency (%)
( 10
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 133242
89.6%
Common 15446
 
10.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 24741
18.6%
e 16753
12.6%
r 14288
10.7%
g 12663
9.5%
t 10704
8.0%
s 8784
 
6.6%
c 8700
 
6.5%
o 6824
 
5.1%
d 6253
 
4.7%
n 6002
 
4.5%
Other values (15) 17530
13.2%
Common
ValueCountFrequency (%)
11215
72.6%
- 1724
 
11.2%
2 1458
 
9.4%
: 547
 
3.5%
1 263
 
1.7%
3 82
 
0.5%
+ 65
 
0.4%
/ 41
 
0.3%
4 31
 
0.2%
( 10
 
0.1%
Other values (3) 10
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 148688
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 24741
16.6%
e 16753
11.3%
r 14288
9.6%
g 12663
8.5%
11215
7.5%
t 10704
 
7.2%
s 8784
 
5.9%
c 8700
 
5.9%
o 6824
 
4.6%
d 6253
 
4.2%
Other values (28) 27763
18.7%

covered
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct18
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.0199852
Minimum1
Maximum424
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size105.7 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median2
Q32
95-th percentile3
Maximum424
Range423
Interquartile range (IQR)0

Descriptive statistics

Standard deviation3.7397042
Coefficient of variation (CV)1.8513523
Kurtosis12007.106
Mean2.0199852
Median Absolute Deviation (MAD)0
Skewness106.68066
Sum27290
Variance13.985388
MonotonicityNot monotonic
Histogram with fixed size bins (bins=18)
ValueCountFrequency (%)
2 10045
74.4%
1 2191
 
16.2%
3 892
 
6.6%
4 251
 
1.9%
5 59
 
0.4%
6 25
 
0.2%
7 16
 
0.1%
8 15
 
0.1%
9 4
 
< 0.1%
12 2
 
< 0.1%
Other values (8) 10
 
0.1%
ValueCountFrequency (%)
1 2191
 
16.2%
2 10045
74.4%
3 892
 
6.6%
4 251
 
1.9%
5 59
 
0.4%
6 25
 
0.2%
7 16
 
0.1%
8 15
 
0.1%
9 4
 
< 0.1%
10 1
 
< 0.1%
ValueCountFrequency (%)
424 1
 
< 0.1%
60 1
 
< 0.1%
20 1
 
< 0.1%
16 1
 
< 0.1%
14 1
 
< 0.1%
13 2
< 0.1%
12 2
< 0.1%
11 2
< 0.1%
10 1
 
< 0.1%
9 4
< 0.1%

garage
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct16
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.9233161
Minimum1
Maximum424
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size105.7 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median2
Q32
95-th percentile3
Maximum424
Range423
Interquartile range (IQR)0

Descriptive statistics

Standard deviation4.1478572
Coefficient of variation (CV)2.1566176
Kurtosis8427.5132
Mean1.9233161
Median Absolute Deviation (MAD)0
Skewness87.843613
Sum25984
Variance17.204719
MonotonicityNot monotonic
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
2 9010
66.7%
1 3351
 
24.8%
3 880
 
6.5%
4 180
 
1.3%
5 35
 
0.3%
6 20
 
0.1%
8 12
 
0.1%
7 9
 
0.1%
9 4
 
< 0.1%
10 2
 
< 0.1%
Other values (6) 7
 
0.1%
ValueCountFrequency (%)
1 3351
 
24.8%
2 9010
66.7%
3 880
 
6.5%
4 180
 
1.3%
5 35
 
0.3%
6 20
 
0.1%
7 9
 
0.1%
8 12
 
0.1%
9 4
 
< 0.1%
10 2
 
< 0.1%
ValueCountFrequency (%)
424 1
 
< 0.1%
212 1
 
< 0.1%
60 1
 
< 0.1%
13 1
 
< 0.1%
12 1
 
< 0.1%
11 2
 
< 0.1%
10 2
 
< 0.1%
9 4
 
< 0.1%
8 12
0.1%
7 9
0.1%

total_spaces
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct32
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.4140637
Minimum1
Maximum424
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size105.7 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median2
Q32
95-th percentile5
Maximum424
Range423
Interquartile range (IQR)0

Descriptive statistics

Standard deviation4.7595506
Coefficient of variation (CV)1.9715928
Kurtosis5077.1154
Mean2.4140637
Median Absolute Deviation (MAD)0
Skewness64.108215
Sum32614
Variance22.653322
MonotonicityNot monotonic
Histogram with fixed size bins (bins=32)
ValueCountFrequency (%)
2 8908
65.9%
1 1910
 
14.1%
3 1065
 
7.9%
4 906
 
6.7%
6 266
 
2.0%
5 209
 
1.5%
8 78
 
0.6%
7 51
 
0.4%
10 28
 
0.2%
9 23
 
0.2%
Other values (22) 66
 
0.5%
ValueCountFrequency (%)
1 1910
 
14.1%
2 8908
65.9%
3 1065
 
7.9%
4 906
 
6.7%
5 209
 
1.5%
6 266
 
2.0%
7 51
 
0.4%
8 78
 
0.6%
9 23
 
0.2%
10 28
 
0.2%
ValueCountFrequency (%)
424 1
< 0.1%
212 1
< 0.1%
203 1
< 0.1%
80 1
< 0.1%
60 1
< 0.1%
55 1
< 0.1%
45 1
< 0.1%
40 1
< 0.1%
35 1
< 0.1%
33 1
< 0.1%

subtype
Categorical

Distinct27
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size105.7 KiB
Single Family Residence
10857 
Residentia
1089 
Detached
 
506
none
 
390
Residential
 
361
Other values (22)
 
307

Length

Max length29
Median length23
Mean length20.427387
Min length4

Characters and Unicode

Total characters275974
Distinct characters39
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)< 0.1%

Sample

1st rownone
2nd rownone
3rd rowSingle Family - Detached
4th rownone
5th rowSingle Family Residence

Common Values

ValueCountFrequency (%)
Single Family Residence 10857
80.4%
Residentia 1089
 
8.1%
Detached 506
 
3.7%
none 390
 
2.9%
Residential 361
 
2.7%
Single Family - Detached 119
 
0.9%
Ranch 45
 
0.3%
All Other Attached 40
 
0.3%
Single Family - Semi-Attached 23
 
0.2%
Residential-Detache 14
 
0.1%
Other values (17) 66
 
0.5%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
family 11025
30.8%
single 11022
30.8%
residence 10857
30.3%
residentia 1089
 
3.0%
detached 625
 
1.7%
none 390
 
1.1%
residential 361
 
1.0%
148
 
0.4%
attached 46
 
0.1%
ranch 45
 
0.1%
Other values (30) 227
 
0.6%

Most occurring characters

ValueCountFrequency (%)
e 48389
17.5%
i 35901
13.0%
n 24184
8.8%
l 22533
 
8.2%
22325
 
8.1%
a 13280
 
4.8%
d 13039
 
4.7%
R 12370
 
4.5%
s 12334
 
4.5%
c 11625
 
4.2%
Other values (29) 59994
21.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 218138
79.0%
Uppercase Letter 35324
 
12.8%
Space Separator 22325
 
8.1%
Dash Punctuation 185
 
0.1%
Decimal Number 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 48389
22.2%
i 35901
16.5%
n 24184
11.1%
l 22533
10.3%
a 13280
 
6.1%
d 13039
 
6.0%
s 12334
 
5.7%
c 11625
 
5.3%
m 11062
 
5.1%
y 11025
 
5.1%
Other values (11) 14766
 
6.8%
Uppercase Letter
ValueCountFrequency (%)
R 12370
35.0%
S 11068
31.3%
F 11026
31.2%
D 648
 
1.8%
A 111
 
0.3%
O 41
 
0.1%
P 12
 
< 0.1%
H 12
 
< 0.1%
C 12
 
< 0.1%
W 7
 
< 0.1%
Other values (5) 17
 
< 0.1%
Space Separator
ValueCountFrequency (%)
22325
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 185
100.0%
Decimal Number
ValueCountFrequency (%)
2 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 253462
91.8%
Common 22512
 
8.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 48389
19.1%
i 35901
14.2%
n 24184
9.5%
l 22533
8.9%
a 13280
 
5.2%
d 13039
 
5.1%
R 12370
 
4.9%
s 12334
 
4.9%
c 11625
 
4.6%
S 11068
 
4.4%
Other values (26) 48739
19.2%
Common
ValueCountFrequency (%)
22325
99.2%
- 185
 
0.8%
2 2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 275974
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 48389
17.5%
i 35901
13.0%
n 24184
8.8%
l 22533
 
8.2%
22325
 
8.1%
a 13280
 
4.8%
d 13039
 
4.7%
R 12370
 
4.5%
s 12334
 
4.5%
c 11625
 
4.2%
Other values (29) 59994
21.7%

year
Real number (ℝ)

Distinct172
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1965.5639
Minimum0
Maximum9999
Zeros17
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size105.7 KiB

Quantile statistics

Minimum0
5-th percentile1905
Q11941
median1965
Q31999
95-th percentile2022
Maximum9999
Range9999
Interquartile range (IQR)58

Descriptive statistics

Standard deviation125.26858
Coefficient of variation (CV)0.063731627
Kurtosis2578.4939
Mean1965.5639
Median Absolute Deviation (MAD)29
Skewness34.186623
Sum26554768
Variance15692.218
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2022 675
 
5.0%
1925 344
 
2.5%
1950 280
 
2.1%
1965 268
 
2.0%
1920 259
 
1.9%
1955 238
 
1.8%
1900 231
 
1.7%
1940 218
 
1.6%
2005 207
 
1.5%
2006 206
 
1.5%
Other values (162) 10584
78.3%
ValueCountFrequency (%)
0 17
0.1%
1730 1
 
< 0.1%
1750 1
 
< 0.1%
1807 1
 
< 0.1%
1829 1
 
< 0.1%
1830 1
 
< 0.1%
1836 1
 
< 0.1%
1843 1
 
< 0.1%
1847 1
 
< 0.1%
1848 1
 
< 0.1%
ValueCountFrequency (%)
9999 2
 
< 0.1%
2023 69
 
0.5%
2022 675
5.0%
2021 82
 
0.6%
2020 105
 
0.8%
2019 118
 
0.9%
2018 114
 
0.8%
2017 110
 
0.8%
2016 110
 
0.8%
2015 110
 
0.8%

sewer
Categorical

HIGH CARDINALITY  HIGH CORRELATION  IMBALANCE 

Distinct88
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size105.7 KiB
public sewer
7311 
none
2648 
city sewer
1454 
sewer connected
 
561
saw
 
210
Other values (83)
1326 

Length

Max length33
Median length12
Mean length10.158327
Min length1

Characters and Unicode

Total characters137239
Distinct characters29
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique26 ?
Unique (%)0.2%

Sample

1st rownone
2nd rownone
3rd rowpublic sewer
4th rownone
5th rowpublic sewer

Common Values

ValueCountFrequency (%)
public sewer 7311
54.1%
none 2648
 
19.6%
city sewer 1454
 
10.8%
sewer connected 561
 
4.2%
saw 210
 
1.6%
sewer system 183
 
1.4%
septic system 127
 
0.9%
septic tank 125
 
0.9%
saws 103
 
0.8%
in street 80
 
0.6%
Other values (78) 708
 
5.2%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
sewer 9677
40.2%
public 7408
30.8%
none 2648
 
11.0%
city 1465
 
6.1%
connected 586
 
2.4%
septic 434
 
1.8%
system 338
 
1.4%
saw 210
 
0.9%
in 128
 
0.5%
tank 125
 
0.5%
Other values (74) 1027
 
4.3%

Most occurring characters

ValueCountFrequency (%)
e 24629
17.9%
s 11563
8.4%
c 10699
7.8%
10536
7.7%
r 10127
 
7.4%
w 10111
 
7.4%
i 10035
 
7.3%
p 8048
 
5.9%
l 7648
 
5.6%
u 7642
 
5.6%
Other values (19) 26201
19.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 126540
92.2%
Space Separator 10536
 
7.7%
Open Punctuation 62
 
< 0.1%
Close Punctuation 48
 
< 0.1%
Other Punctuation 40
 
< 0.1%
Dash Punctuation 13
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 24629
19.5%
s 11563
9.1%
c 10699
8.5%
r 10127
8.0%
w 10111
8.0%
i 10035
7.9%
p 8048
 
6.4%
l 7648
 
6.0%
u 7642
 
6.0%
b 7469
 
5.9%
Other values (13) 18569
14.7%
Other Punctuation
ValueCountFrequency (%)
& 22
55.0%
/ 18
45.0%
Space Separator
ValueCountFrequency (%)
10536
100.0%
Open Punctuation
ValueCountFrequency (%)
( 62
100.0%
Close Punctuation
ValueCountFrequency (%)
) 48
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 13
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 126540
92.2%
Common 10699
 
7.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 24629
19.5%
s 11563
9.1%
c 10699
8.5%
r 10127
8.0%
w 10111
8.0%
i 10035
7.9%
p 8048
 
6.4%
l 7648
 
6.0%
u 7642
 
6.0%
b 7469
 
5.9%
Other values (13) 18569
14.7%
Common
ValueCountFrequency (%)
10536
98.5%
( 62
 
0.6%
) 48
 
0.4%
& 22
 
0.2%
/ 18
 
0.2%
- 13
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 137239
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 24629
17.9%
s 11563
8.4%
c 10699
7.8%
10536
7.7%
r 10127
 
7.4%
w 10111
 
7.4%
i 10035
 
7.3%
p 8048
 
5.9%
l 7648
 
5.6%
u 7642
 
5.6%
Other values (19) 26201
19.1%

water
Categorical

HIGH CARDINALITY  HIGH CORRELATION  IMBALANCE 

Distinct74
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size105.7 KiB
Public
7031 
none
2181 
City Water
1110 
Publi
 
561
Lake Michigan
 
389
Other values (69)
2238 

Length

Max length32
Median length6
Mean length7.0184308
Min length3

Characters and Unicode

Total characters94819
Distinct characters47
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique25 ?
Unique (%)0.2%

Sample

1st rownone
2nd rownone
3rd rownone
4th rownone
5th rowPublic

Common Values

ValueCountFrequency (%)
Public 7031
52.0%
none 2181
 
16.1%
City Water 1110
 
8.2%
Publi 561
 
4.2%
Lake Michigan 389
 
2.9%
Meter on Property 386
 
2.9%
City Wate 335
 
2.5%
SAW 262
 
1.9%
Water System 202
 
1.5%
Water District 175
 
1.3%
Other values (64) 878
 
6.5%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
public 7034
40.8%
none 2195
 
12.7%
water 1550
 
9.0%
city 1461
 
8.5%
publi 561
 
3.3%
lake 549
 
3.2%
meter 481
 
2.8%
on 460
 
2.7%
michigan 389
 
2.3%
property 386
 
2.2%
Other values (57) 2156
 
12.5%

Most occurring characters

ValueCountFrequency (%)
i 11224
11.8%
l 8161
 
8.6%
P 8098
 
8.5%
c 7957
 
8.4%
u 7759
 
8.2%
b 7598
 
8.0%
e 6692
 
7.1%
n 5413
 
5.7%
t 5379
 
5.7%
3712
 
3.9%
Other values (37) 22826
24.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 75388
79.5%
Uppercase Letter 15550
 
16.4%
Space Separator 3712
 
3.9%
Open Punctuation 82
 
0.1%
Close Punctuation 69
 
0.1%
Other Punctuation 13
 
< 0.1%
Dash Punctuation 5
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 11224
14.9%
l 8161
10.8%
c 7957
10.6%
u 7759
10.3%
b 7598
10.1%
e 6692
8.9%
n 5413
7.2%
t 5379
7.1%
r 3348
 
4.4%
a 3251
 
4.3%
Other values (13) 8606
11.4%
Uppercase Letter
ValueCountFrequency (%)
P 8098
52.1%
W 2425
 
15.6%
C 1512
 
9.7%
M 1219
 
7.8%
S 726
 
4.7%
L 550
 
3.5%
A 362
 
2.3%
D 356
 
2.3%
U 161
 
1.0%
I 44
 
0.3%
Other values (9) 97
 
0.6%
Space Separator
ValueCountFrequency (%)
3712
100.0%
Open Punctuation
ValueCountFrequency (%)
( 82
100.0%
Close Punctuation
ValueCountFrequency (%)
) 69
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 13
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 90938
95.9%
Common 3881
 
4.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 11224
12.3%
l 8161
9.0%
P 8098
8.9%
c 7957
8.7%
u 7759
8.5%
b 7598
 
8.4%
e 6692
 
7.4%
n 5413
 
6.0%
t 5379
 
5.9%
r 3348
 
3.7%
Other values (32) 19309
21.2%
Common
ValueCountFrequency (%)
3712
95.6%
( 82
 
2.1%
) 69
 
1.8%
/ 13
 
0.3%
- 5
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 94819
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 11224
11.8%
l 8161
 
8.6%
P 8098
 
8.5%
c 7957
 
8.4%
u 7759
 
8.2%
b 7598
 
8.0%
e 6692
 
7.1%
n 5413
 
5.7%
t 5379
 
5.7%
3712
 
3.9%
Other values (37) 22826
24.1%

app
Categorical

Distinct145
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size105.7 KiB
dishwasher
5128 
none
1848 
electric water heater
578 
gas water heater
551 
built-in microwave
545 
Other values (140)
4860 

Length

Max length39
Median length34
Mean length10.910215
Min length4

Characters and Unicode

Total characters147397
Distinct characters34
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique33 ?
Unique (%)0.2%

Sample

1st rownone
2nd rowmicrowave
3rd rowdishwasher
4th rowdryer
5th rowdishwasher

Common Values

ValueCountFrequency (%)
dishwasher 5128
38.0%
none 1848
 
13.7%
electric water heater 578
 
4.3%
gas water heater 551
 
4.1%
built-in microwave 545
 
4.0%
range 538
 
4.0%
cooktop 304
 
2.3%
electric cooktop 298
 
2.2%
gas cooktop 270
 
2.0%
dryer 265
 
2.0%
Other values (135) 3185
23.6%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
dishwasher 5128
25.3%
none 1850
 
9.1%
gas 1365
 
6.7%
water 1302
 
6.4%
electric 1271
 
6.3%
heater 1246
 
6.2%
built-in 1072
 
5.3%
range 1060
 
5.2%
cooktop 876
 
4.3%
oven 826
 
4.1%
Other values (112) 4261
21.0%

Most occurring characters

ValueCountFrequency (%)
e 19196
13.0%
r 13886
 
9.4%
s 12706
 
8.6%
a 12603
 
8.6%
h 11749
 
8.0%
i 10947
 
7.4%
n 8418
 
5.7%
o 7604
 
5.2%
w 7296
 
4.9%
t 6754
 
4.6%
Other values (24) 36238
24.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 139114
94.4%
Space Separator 6747
 
4.6%
Dash Punctuation 1149
 
0.8%
Other Punctuation 174
 
0.1%
Open Punctuation 113
 
0.1%
Decimal Number 84
 
0.1%
Close Punctuation 16
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 19196
13.8%
r 13886
10.0%
s 12706
9.1%
a 12603
9.1%
h 11749
8.4%
i 10947
 
7.9%
n 8418
 
6.1%
o 7604
 
5.5%
w 7296
 
5.2%
t 6754
 
4.9%
Other values (15) 27955
20.1%
Other Punctuation
ValueCountFrequency (%)
/ 125
71.8%
: 45
 
25.9%
& 4
 
2.3%
Decimal Number
ValueCountFrequency (%)
6 69
82.1%
0 15
 
17.9%
Space Separator
ValueCountFrequency (%)
6747
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1149
100.0%
Open Punctuation
ValueCountFrequency (%)
( 113
100.0%
Close Punctuation
ValueCountFrequency (%)
) 16
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 139114
94.4%
Common 8283
 
5.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 19196
13.8%
r 13886
10.0%
s 12706
9.1%
a 12603
9.1%
h 11749
8.4%
i 10947
 
7.9%
n 8418
 
6.1%
o 7604
 
5.5%
w 7296
 
5.2%
t 6754
 
4.9%
Other values (15) 27955
20.1%
Common
ValueCountFrequency (%)
6747
81.5%
- 1149
 
13.9%
/ 125
 
1.5%
( 113
 
1.4%
6 69
 
0.8%
: 45
 
0.5%
) 16
 
0.2%
0 15
 
0.2%
& 4
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 147397
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 19196
13.0%
r 13886
 
9.4%
s 12706
 
8.6%
a 12603
 
8.6%
h 11749
 
8.0%
i 10947
 
7.4%
n 8418
 
5.7%
o 7604
 
5.2%
w 7296
 
4.9%
t 6754
 
4.6%
Other values (24) 36238
24.6%

heating
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct90
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size105.7 KiB
central
4799 
forced air
2946 
natural gas
2057 
electric
894 
none
736 
Other values (85)
2078 

Length

Max length31
Median length30
Mean length8.8778682
Min length2

Characters and Unicode

Total characters119940
Distinct characters36
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique23 ?
Unique (%)0.2%

Sample

1st rownone
2nd rownone
3rd rowhot water
4th rownone
5th rownatural gas

Common Values

ValueCountFrequency (%)
central 4799
35.5%
forced air 2946
21.8%
natural gas 2057
15.2%
electric 894
 
6.6%
none 736
 
5.4%
hot water 270
 
2.0%
fireplace(s 167
 
1.2%
radiator 143
 
1.1%
central forced air 131
 
1.0%
other 128
 
0.9%
Other values (80) 1239
 
9.2%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
central 5016
24.5%
air 3241
15.8%
forced 3226
15.7%
gas 2191
10.7%
natural 2057
10.0%
electric 924
 
4.5%
none 736
 
3.6%
hot 329
 
1.6%
water 316
 
1.5%
fireplace(s 172
 
0.8%
Other values (88) 2283
11.1%

Most occurring characters

ValueCountFrequency (%)
a 16486
13.7%
r 16034
13.4%
e 12703
10.6%
c 10651
8.9%
t 9535
7.9%
n 9066
7.6%
l 8768
7.3%
6981
 
5.8%
i 5348
 
4.5%
o 5013
 
4.2%
Other values (26) 19355
16.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 112096
93.5%
Space Separator 6981
 
5.8%
Dash Punctuation 291
 
0.2%
Open Punctuation 208
 
0.2%
Decimal Number 180
 
0.2%
Other Punctuation 130
 
0.1%
Close Punctuation 36
 
< 0.1%
Math Symbol 18
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 16486
14.7%
r 16034
14.3%
e 12703
11.3%
c 10651
9.5%
t 9535
8.5%
n 9066
8.1%
l 8768
7.8%
i 5348
 
4.8%
o 5013
 
4.5%
d 3773
 
3.4%
Other values (14) 14719
13.1%
Decimal Number
ValueCountFrequency (%)
0 87
48.3%
9 87
48.3%
3 2
 
1.1%
1 2
 
1.1%
2 2
 
1.1%
Other Punctuation
ValueCountFrequency (%)
% 87
66.9%
/ 43
33.1%
Space Separator
ValueCountFrequency (%)
6981
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 291
100.0%
Open Punctuation
ValueCountFrequency (%)
( 208
100.0%
Close Punctuation
ValueCountFrequency (%)
) 36
100.0%
Math Symbol
ValueCountFrequency (%)
+ 18
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 112096
93.5%
Common 7844
 
6.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 16486
14.7%
r 16034
14.3%
e 12703
11.3%
c 10651
9.5%
t 9535
8.5%
n 9066
8.1%
l 8768
7.8%
i 5348
 
4.8%
o 5013
 
4.5%
d 3773
 
3.4%
Other values (14) 14719
13.1%
Common
ValueCountFrequency (%)
6981
89.0%
- 291
 
3.7%
( 208
 
2.7%
0 87
 
1.1%
% 87
 
1.1%
9 87
 
1.1%
/ 43
 
0.5%
) 36
 
0.5%
+ 18
 
0.2%
3 2
 
< 0.1%
Other values (2) 4
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 119940
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 16486
13.7%
r 16034
13.4%
e 12703
10.6%
c 10651
8.9%
t 9535
7.9%
n 9066
7.6%
l 8768
7.3%
6981
 
5.8%
i 5348
 
4.5%
o 5013
 
4.2%
Other values (26) 19355
16.1%

cooling
Categorical

Distinct116
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size105.7 KiB
Central Air
4906 
Ceiling Fan(s
1543 
Central Ai
1218 
None
860 
none
732 
Other values (111)
4251 

Length

Max length31
Median length30
Mean length10.387935
Min length2

Characters and Unicode

Total characters140341
Distinct characters59
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique22 ?
Unique (%)0.2%

Sample

1st rownone
2nd rownone
3rd rowUnits
4th rownone
5th rowWall Unit(s)

Common Values

ValueCountFrequency (%)
Central Air 4906
36.3%
Ceiling Fan(s 1543
 
11.4%
Central Ai 1218
 
9.0%
None 860
 
6.4%
none 732
 
5.4%
Electri 549
 
4.1%
Central A/ 514
 
3.8%
Electric 421
 
3.1%
Central Forced Air 311
 
2.3%
Refrigeration 215
 
1.6%
Other values (106) 2241
16.6%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
central 7259
29.8%
air 5563
22.9%
ceiling 1613
 
6.6%
fan(s 1613
 
6.6%
none 1592
 
6.5%
ai 1321
 
5.4%
forced 608
 
2.5%
a 555
 
2.3%
electri 549
 
2.3%
electric 423
 
1.7%
Other values (108) 3233
13.3%

Most occurring characters

ValueCountFrequency (%)
r 15330
10.9%
n 14689
10.5%
i 13703
9.8%
e 13374
9.5%
10821
7.7%
l 10762
7.7%
a 10108
7.2%
t 9836
 
7.0%
C 9149
 
6.5%
A 7588
 
5.4%
Other values (49) 24981
17.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 101713
72.5%
Uppercase Letter 24096
 
17.2%
Space Separator 10821
 
7.7%
Open Punctuation 1945
 
1.4%
Other Punctuation 804
 
0.6%
Close Punctuation 298
 
0.2%
Decimal Number 293
 
0.2%
Dash Punctuation 268
 
0.2%
Math Symbol 103
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 15330
15.1%
n 14689
14.4%
i 13703
13.5%
e 13374
13.1%
l 10762
10.6%
a 10108
9.9%
t 9836
9.7%
o 3724
 
3.7%
s 2444
 
2.4%
c 2372
 
2.3%
Other values (13) 5371
 
5.3%
Uppercase Letter
ValueCountFrequency (%)
C 9149
38.0%
A 7588
31.5%
F 2328
 
9.7%
E 1214
 
5.0%
N 935
 
3.9%
W 612
 
2.5%
U 461
 
1.9%
R 458
 
1.9%
H 234
 
1.0%
S 212
 
0.9%
Other values (11) 905
 
3.8%
Decimal Number
ValueCountFrequency (%)
1 92
31.4%
0 47
16.0%
9 47
16.0%
3 47
16.0%
5 26
 
8.9%
2 19
 
6.5%
6 15
 
5.1%
Other Punctuation
ValueCountFrequency (%)
/ 751
93.4%
% 47
 
5.8%
& 6
 
0.7%
Space Separator
ValueCountFrequency (%)
10821
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1945
100.0%
Close Punctuation
ValueCountFrequency (%)
) 298
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 268
100.0%
Math Symbol
ValueCountFrequency (%)
+ 103
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 125809
89.6%
Common 14532
 
10.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 15330
12.2%
n 14689
11.7%
i 13703
10.9%
e 13374
10.6%
l 10762
8.6%
a 10108
8.0%
t 9836
7.8%
C 9149
7.3%
A 7588
6.0%
o 3724
 
3.0%
Other values (34) 17546
13.9%
Common
ValueCountFrequency (%)
10821
74.5%
( 1945
 
13.4%
/ 751
 
5.2%
) 298
 
2.1%
- 268
 
1.8%
+ 103
 
0.7%
1 92
 
0.6%
% 47
 
0.3%
0 47
 
0.3%
9 47
 
0.3%
Other values (5) 113
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 140341
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 15330
10.9%
n 14689
10.5%
i 13703
9.8%
e 13374
9.5%
10821
7.7%
l 10762
7.7%
a 10108
7.2%
t 9836
 
7.0%
C 9149
 
6.5%
A 7588
 
5.4%
Other values (49) 24981
17.8%

stories
Real number (ℝ)

Distinct10
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.6638046
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size105.7 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median1.5
Q32
95-th percentile3
Maximum7
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.67669563
Coefficient of variation (CV)0.40671581
Kurtosis0.87803927
Mean1.6638046
Median Absolute Deviation (MAD)0.5
Skewness0.92630735
Sum22478
Variance0.45791698
MonotonicityNot monotonic
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1 5378
39.8%
2 5079
37.6%
1.5 1565
 
11.6%
3 1152
 
8.5%
3.5 153
 
1.1%
2.5 94
 
0.7%
4 80
 
0.6%
5 7
 
0.1%
6 1
 
< 0.1%
7 1
 
< 0.1%
ValueCountFrequency (%)
1 5378
39.8%
1.5 1565
 
11.6%
2 5079
37.6%
2.5 94
 
0.7%
3 1152
 
8.5%
3.5 153
 
1.1%
4 80
 
0.6%
5 7
 
0.1%
6 1
 
< 0.1%
7 1
 
< 0.1%
ValueCountFrequency (%)
7 1
 
< 0.1%
6 1
 
< 0.1%
5 7
 
0.1%
4 80
 
0.6%
3.5 153
 
1.1%
3 1152
 
8.5%
2.5 94
 
0.7%
2 5079
37.6%
1.5 1565
 
11.6%
1 5378
39.8%

materials
Categorical

Distinct161
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Memory size105.7 KiB
brick
4257 
none
2122 
stucco
792 
vinyl sideing
647 
frame
552 
Other values (156)
5140 

Length

Max length30
Median length26
Mean length7.2612139
Min length3

Characters and Unicode

Total characters98099
Distinct characters34
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique44 ?
Unique (%)0.3%

Sample

1st rownone
2nd rownone
3rd rownone
4th rownone
5th rowbrick

Common Values

ValueCountFrequency (%)
brick 4257
31.5%
none 2122
15.7%
stucco 792
 
5.9%
vinyl sideing 647
 
4.8%
frame 552
 
4.1%
masonry 476
 
3.5%
wood sideing 417
 
3.1%
frame - woo 387
 
2.9%
stone 363
 
2.7%
block 341
 
2.5%
Other values (151) 3156
23.4%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
brick 4636
26.1%
none 2122
12.0%
sideing 1534
 
8.6%
frame 977
 
5.5%
vinyl 844
 
4.8%
stucco 811
 
4.6%
wood 685
 
3.9%
masonry 618
 
3.5%
sidein 517
 
2.9%
421
 
2.4%
Other values (140) 4578
25.8%

Most occurring characters

ValueCountFrequency (%)
i 10885
11.1%
n 10044
 
10.2%
c 8123
 
8.3%
e 8122
 
8.3%
o 7631
 
7.8%
r 7496
 
7.6%
k 5374
 
5.5%
b 5317
 
5.4%
s 4964
 
5.1%
4233
 
4.3%
Other values (24) 25910
26.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 92939
94.7%
Space Separator 4233
 
4.3%
Dash Punctuation 539
 
0.5%
Other Punctuation 305
 
0.3%
Decimal Number 71
 
0.1%
Open Punctuation 10
 
< 0.1%
Close Punctuation 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 10885
11.7%
n 10044
10.8%
c 8123
 
8.7%
e 8122
 
8.7%
o 7631
 
8.2%
r 7496
 
8.1%
k 5374
 
5.8%
b 5317
 
5.7%
s 4964
 
5.3%
d 3456
 
3.7%
Other values (14) 21527
23.2%
Other Punctuation
ValueCountFrequency (%)
/ 274
89.8%
& 25
 
8.2%
. 6
 
2.0%
Dash Punctuation
ValueCountFrequency (%)
- 416
77.2%
– 123
 
22.8%
Decimal Number
ValueCountFrequency (%)
4 53
74.6%
3 18
 
25.4%
Space Separator
ValueCountFrequency (%)
4233
100.0%
Open Punctuation
ValueCountFrequency (%)
( 10
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 92939
94.7%
Common 5160
 
5.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 10885
11.7%
n 10044
10.8%
c 8123
 
8.7%
e 8122
 
8.7%
o 7631
 
8.2%
r 7496
 
8.1%
k 5374
 
5.8%
b 5317
 
5.7%
s 4964
 
5.3%
d 3456
 
3.7%
Other values (14) 21527
23.2%
Common
ValueCountFrequency (%)
4233
82.0%
- 416
 
8.1%
/ 274
 
5.3%
– 123
 
2.4%
4 53
 
1.0%
& 25
 
0.5%
3 18
 
0.3%
( 10
 
0.2%
. 6
 
0.1%
) 2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 97976
99.9%
Punctuation 123
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 10885
11.1%
n 10044
 
10.3%
c 8123
 
8.3%
e 8122
 
8.3%
o 7631
 
7.8%
r 7496
 
7.7%
k 5374
 
5.5%
b 5317
 
5.4%
s 4964
 
5.1%
4233
 
4.3%
Other values (23) 25787
26.3%
Punctuation
ValueCountFrequency (%)
– 123
100.0%

roof
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct286
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Memory size105.7 KiB
none
4551 
Composition
4193 
Shingle
1376 
Tile
563 
Asphalt
 
453
Other values (281)
2374 

Length

Max length57
Median length50
Mean length8.0532198
Min length4

Characters and Unicode

Total characters108799
Distinct characters51
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique149 ?
Unique (%)1.1%

Sample

1st rownone
2nd rownone
3rd rownone
4th rownone
5th rownone

Common Values

ValueCountFrequency (%)
none 4551
33.7%
Composition 4193
31.0%
Shingle 1376
 
10.2%
Tile 563
 
4.2%
Asphalt 453
 
3.4%
Comp Shingle 285
 
2.1%
Metal 194
 
1.4%
Other 182
 
1.3%
Composition,Shingle 154
 
1.1%
Flat 151
 
1.1%
Other values (276) 1408
 
10.4%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
none 4554
31.3%
composition 4219
29.0%
shingle 1722
 
11.9%
tile 633
 
4.4%
asphalt 455
 
3.1%
comp 408
 
2.8%
metal 194
 
1.3%
other 190
 
1.3%
updated 187
 
1.3%
flat 162
 
1.1%
Other values (272) 1806
 
12.4%

Most occurring characters

ValueCountFrequency (%)
o 19562
18.0%
n 16111
14.8%
i 12542
11.5%
e 9134
8.4%
t 6642
 
6.1%
p 5965
 
5.5%
s 5480
 
5.0%
m 5294
 
4.9%
C 5201
 
4.8%
l 4768
 
4.4%
Other values (41) 18100
16.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 95543
87.8%
Uppercase Letter 11071
 
10.2%
Other Punctuation 1087
 
1.0%
Space Separator 1020
 
0.9%
Dash Punctuation 54
 
< 0.1%
Open Punctuation 11
 
< 0.1%
Close Punctuation 11
 
< 0.1%
Decimal Number 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 19562
20.5%
n 16111
16.9%
i 12542
13.1%
e 9134
9.6%
t 6642
 
7.0%
p 5965
 
6.2%
s 5480
 
5.7%
m 5294
 
5.5%
l 4768
 
5.0%
h 3183
 
3.3%
Other values (13) 6862
 
7.2%
Uppercase Letter
ValueCountFrequency (%)
C 5201
47.0%
S 2420
21.9%
T 838
 
7.6%
A 610
 
5.5%
F 495
 
4.5%
M 352
 
3.2%
O 272
 
2.5%
U 265
 
2.4%
R 260
 
2.3%
H 83
 
0.7%
Other values (10) 275
 
2.5%
Other Punctuation
ValueCountFrequency (%)
, 908
83.5%
/ 179
 
16.5%
Decimal Number
ValueCountFrequency (%)
1 1
50.0%
5 1
50.0%
Space Separator
ValueCountFrequency (%)
1020
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 54
100.0%
Open Punctuation
ValueCountFrequency (%)
( 11
100.0%
Close Punctuation
ValueCountFrequency (%)
) 11
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 106614
98.0%
Common 2185
 
2.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 19562
18.3%
n 16111
15.1%
i 12542
11.8%
e 9134
8.6%
t 6642
 
6.2%
p 5965
 
5.6%
s 5480
 
5.1%
m 5294
 
5.0%
C 5201
 
4.9%
l 4768
 
4.5%
Other values (33) 15915
14.9%
Common
ValueCountFrequency (%)
1020
46.7%
, 908
41.6%
/ 179
 
8.2%
- 54
 
2.5%
( 11
 
0.5%
) 11
 
0.5%
1 1
 
< 0.1%
5 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 108799
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 19562
18.0%
n 16111
14.8%
i 12542
11.5%
e 9134
8.4%
t 6642
 
6.1%
p 5965
 
5.5%
s 5480
 
5.0%
m 5294
 
4.9%
C 5201
 
4.8%
l 4768
 
4.4%
Other values (41) 18100
16.6%

foundation
Categorical

Distinct49
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size105.7 KiB
none
5981 
slab
3710 
concrete perimeter
712 
crawl space
 
493
pillar/post/pier
 
491
Other values (44)
2123 

Length

Max length27
Median length4
Mean length6.4080681
Min length4

Characters and Unicode

Total characters86573
Distinct characters25
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique11 ?
Unique (%)0.1%

Sample

1st rownone
2nd rownone
3rd rownone
4th rownone
5th rownone

Common Values

ValueCountFrequency (%)
none 5981
44.3%
slab 3710
27.5%
concrete perimeter 712
 
5.3%
crawl space 493
 
3.6%
pillar/post/pier 491
 
3.6%
poured concrete 465
 
3.4%
blogck 336
 
2.5%
other 236
 
1.7%
brick/mortar 163
 
1.2%
stone 160
 
1.2%
Other values (39) 763
 
5.6%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
none 5981
38.1%
slab 3840
24.5%
concrete 1309
 
8.3%
perimeter 770
 
4.9%
crawl 493
 
3.1%
space 493
 
3.1%
pillar/post/pier 491
 
3.1%
poured 465
 
3.0%
blogck 359
 
2.3%
other 238
 
1.5%
Other values (38) 1240
 
7.9%

Most occurring characters

ValueCountFrequency (%)
n 13884
16.0%
e 13326
15.4%
o 9650
11.1%
r 6103
7.0%
l 6031
7.0%
a 6018
7.0%
s 5281
 
6.1%
b 4728
 
5.5%
c 4407
 
5.1%
p 3527
 
4.1%
Other values (15) 13618
15.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 83031
95.9%
Space Separator 2169
 
2.5%
Other Punctuation 1373
 
1.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 13884
16.7%
e 13326
16.0%
o 9650
11.6%
r 6103
7.4%
l 6031
7.3%
a 6018
7.2%
s 5281
 
6.4%
b 4728
 
5.7%
c 4407
 
5.3%
p 3527
 
4.2%
Other values (12) 10076
12.1%
Other Punctuation
ValueCountFrequency (%)
/ 1335
97.2%
& 38
 
2.8%
Space Separator
ValueCountFrequency (%)
2169
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 83031
95.9%
Common 3542
 
4.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 13884
16.7%
e 13326
16.0%
o 9650
11.6%
r 6103
7.4%
l 6031
7.3%
a 6018
7.2%
s 5281
 
6.4%
b 4728
 
5.7%
c 4407
 
5.3%
p 3527
 
4.2%
Other values (12) 10076
12.1%
Common
ValueCountFrequency (%)
2169
61.2%
/ 1335
37.7%
& 38
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 86573
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 13884
16.0%
e 13326
15.4%
o 9650
11.1%
r 6103
7.0%
l 6031
7.0%
a 6018
7.0%
s 5281
 
6.1%
b 4728
 
5.5%
c 4407
 
5.1%
p 3527
 
4.1%
Other values (15) 13618
15.7%

interior
Categorical

Distinct448
Distinct (%)3.3%
Missing0
Missing (%)0.0%
Memory size105.7 KiB
none
4024 
Ceiling Fan(s
 
591
Cable TV Availabl
 
584
Built-in Feature
 
550
One Living Are
 
360
Other values (443)
7401 

Length

Max length38
Median length29
Mean length11.323094
Min length2

Characters and Unicode

Total characters152975
Distinct characters67
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique186 ?
Unique (%)1.4%

Sample

1st rownone
2nd rownone
3rd rownone
4th rownone
5th rowFormal Dining Roo

Common Values

ValueCountFrequency (%)
none 4024
29.8%
Ceiling Fan(s 591
 
4.4%
Cable TV Availabl 584
 
4.3%
Built-in Feature 550
 
4.1%
One Living Are 360
 
2.7%
Walk-In Closet(s 346
 
2.6%
Breakfast Ba 253
 
1.9%
Walk-In Closet(s) 241
 
1.8%
High Ceiling 240
 
1.8%
Two Living Are 239
 
1.8%
Other values (438) 6082
45.0%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
none 4032
 
15.7%
ceiling 1200
 
4.7%
living 793
 
3.1%
are 720
 
2.8%
cable 709
 
2.8%
built-in 687
 
2.7%
fan(s 677
 
2.6%
tv 646
 
2.5%
availabl 606
 
2.4%
closet(s 599
 
2.3%
Other values (437) 14979
58.4%

Most occurring characters

ValueCountFrequency (%)
n 16807
 
11.0%
e 15575
 
10.2%
12138
 
7.9%
i 11087
 
7.2%
o 10958
 
7.2%
a 10083
 
6.6%
l 8915
 
5.8%
t 7287
 
4.8%
r 6927
 
4.5%
s 4250
 
2.8%
Other values (57) 48948
32.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 112713
73.7%
Uppercase Letter 22563
 
14.7%
Space Separator 12138
 
7.9%
Dash Punctuation 1931
 
1.3%
Open Punctuation 1644
 
1.1%
Decimal Number 1007
 
0.7%
Other Punctuation 428
 
0.3%
Close Punctuation 396
 
0.3%
Math Symbol 155
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 16807
14.9%
e 15575
13.8%
i 11087
9.8%
o 10958
9.7%
a 10083
8.9%
l 8915
7.9%
t 7287
6.5%
r 6927
 
6.1%
s 4250
 
3.8%
g 3350
 
3.0%
Other values (14) 17474
15.5%
Uppercase Letter
ValueCountFrequency (%)
C 3834
17.0%
B 3086
13.7%
F 2231
9.9%
A 1948
 
8.6%
L 1201
 
5.3%
T 1183
 
5.2%
I 1115
 
4.9%
V 1024
 
4.5%
W 864
 
3.8%
O 819
 
3.6%
Other values (14) 5258
23.3%
Decimal Number
ValueCountFrequency (%)
9 247
24.5%
2 247
24.5%
1 183
18.2%
0 117
11.6%
5 38
 
3.8%
8 38
 
3.8%
3 37
 
3.7%
6 37
 
3.7%
4 37
 
3.7%
7 26
 
2.6%
Other Punctuation
ValueCountFrequency (%)
/ 238
55.6%
: 119
27.8%
% 70
 
16.4%
& 1
 
0.2%
Space Separator
ValueCountFrequency (%)
12138
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1931
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1644
100.0%
Close Punctuation
ValueCountFrequency (%)
) 396
100.0%
Math Symbol
ValueCountFrequency (%)
+ 155
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 135276
88.4%
Common 17699
 
11.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 16807
12.4%
e 15575
 
11.5%
i 11087
 
8.2%
o 10958
 
8.1%
a 10083
 
7.5%
l 8915
 
6.6%
t 7287
 
5.4%
r 6927
 
5.1%
s 4250
 
3.1%
C 3834
 
2.8%
Other values (38) 39553
29.2%
Common
ValueCountFrequency (%)
12138
68.6%
- 1931
 
10.9%
( 1644
 
9.3%
) 396
 
2.2%
9 247
 
1.4%
2 247
 
1.4%
/ 238
 
1.3%
1 183
 
1.0%
+ 155
 
0.9%
: 119
 
0.7%
Other values (9) 401
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 152975
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 16807
 
11.0%
e 15575
 
10.2%
12138
 
7.9%
i 11087
 
7.2%
o 10958
 
7.2%
a 10083
 
6.6%
l 8915
 
5.8%
t 7287
 
4.8%
r 6927
 
4.5%
s 4250
 
2.8%
Other values (57) 48948
32.0%

Interactions

Correlations

pricelatlongbathbedlivinglot_acoveredgaragetotal_spacesyearstoriesstatusadd_attrstatefireplacesubtypesewerwaterheatingfoundation
price1.0000.051-0.3820.5720.4040.5700.108-0.004-0.0020.105-0.068-0.0130.0000.0410.0540.2200.1200.0000.0000.0000.000
lat0.0511.0000.319-0.033-0.018-0.060-0.391-0.148-0.068-0.161-0.3700.2010.1680.2700.8210.2170.2630.2870.3330.3120.379
long-0.3820.3191.000-0.066-0.056-0.076-0.093-0.0830.018-0.210-0.1230.1800.1940.2180.7820.3200.3190.4960.5070.3230.391
bath0.572-0.033-0.0661.0000.6480.7870.1890.0380.0380.0570.0370.0360.0180.0440.0660.1870.0960.0000.0280.0000.000
bed0.404-0.018-0.0560.6481.0000.6850.1990.0390.0410.0530.0230.0320.0190.0640.0850.1550.0760.0430.0590.0000.014
living0.570-0.060-0.0760.7870.6851.0000.2980.0460.0410.0480.0600.0320.0000.0290.0440.1800.0880.0000.0000.0000.000
lot_a0.108-0.391-0.0930.1890.1990.2981.0000.1010.0820.1080.227-0.0620.0300.0480.1030.1180.0400.0260.0120.0270.000
covered-0.004-0.148-0.0830.0380.0390.0460.1011.0000.8260.7570.2340.0960.0000.0000.0000.0000.0000.0000.0000.0000.000
garage-0.002-0.0680.0180.0380.0410.0410.0820.8261.0000.6550.2190.1060.0000.0000.0030.0000.0290.0000.0000.0000.000
total_spaces0.105-0.161-0.2100.0570.0530.0480.1080.7570.6551.0000.1910.0350.0000.0000.0130.1330.0380.0000.0000.0000.000
year-0.068-0.370-0.1230.0370.0230.0600.2270.2340.2190.1911.0000.1100.1800.0580.1730.0900.0560.0710.2250.2340.202
stories-0.0130.2010.1800.0360.0320.032-0.0620.0960.1060.0350.1101.0000.0400.1850.1690.1750.0870.0740.0580.1340.174
status0.0000.1680.1940.0180.0190.0000.0300.0000.0000.0000.1800.0401.0000.1090.3270.0730.0870.2240.2760.1670.194
add_attr0.0410.2700.2180.0440.0640.0290.0480.0000.0000.0000.0580.1850.1091.0000.5620.3290.1760.7520.7050.4330.458
state0.0540.8210.7820.0660.0850.0440.1030.0000.0030.0130.1730.1690.3270.5621.0000.2210.3070.3810.4180.3930.413
fireplace0.2200.2170.3200.1870.1550.1800.1180.0000.0000.1330.0900.1750.0730.3290.2211.0000.0940.1200.1180.0650.103
subtype0.1200.2630.3190.0960.0760.0880.0400.0000.0290.0380.0560.0870.0870.1760.3070.0941.0000.1610.2310.2490.158
sewer0.0000.2870.4960.0000.0430.0000.0260.0000.0000.0000.0710.0740.2240.7520.3810.1200.1611.0000.4690.1280.189
water0.0000.3330.5070.0280.0590.0000.0120.0000.0000.0000.2250.0580.2760.7050.4180.1180.2310.4691.0000.1580.218
heating0.0000.3120.3230.0000.0000.0000.0270.0000.0000.0000.2340.1340.1670.4330.3930.0650.2490.1280.1581.0000.241
foundation0.0000.3790.3910.0000.0140.0000.0000.0000.0000.0000.2020.1740.1940.4580.4130.1030.1580.1890.2180.2411.000

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

pricestatusadd_attrcitystatelatlongbathbedlivinglot_afireplaceparkingcoveredgaragetotal_spacessubtypeyearsewerwaterappheatingcoolingstoriesmaterialsrooffoundationinterior
0274000House for saleTrueSouth Ozone ParkNY40.675730-73.8223501315562400.000000nonegasrage - detached222none1930nonenonenonenonenone1.5nonenonenonenone
1270000House for saleTrueJamaicaNY40.670036-73.7804502419203998.000000nonenone222none1950nonenonemicrowavenonenone1.5nonenonenonenone
2899000House for saleTrueStaten IslandNY40.524227-74.2157903325326903.000000nonedentache222Single Family - Detached1899public sewernonedishwasherhot waterUnits2.0nonenonenonenone
31390000House for saleTrueFlushingNY40.721615-73.8207554419152697.000000nonenone222none1945nonenonedryernonenone1.5nonenonenonenone
41380000House for saleTrueBrooklynNY40.604470-73.9439604318002000.000000noneshared drivewa111Single Family Residence1930public sewerPublicdishwashernatural gasWall Unit(s)2.0bricknonenoneFormal Dining Roo
5599000House for saleTrueBrooklynNY40.639286-73.9412702313441950.000000nonenone222none1925nonenonedryernonenone1.5nonenonenonenone
61280000House for saleTrueBrooklynNY40.622246-74.0178302314222613.000000nonegasrage - detached111none1920nonenonedishwasherradianCentral1.5noneShake / Shinglenonenone
7899000House for saleTrueBrooklynNY40.578552-74.0051963428003000.000000nonenone212none1945nonenonedishwashernonenone1.5nonenonenonenone
8139000House for saleTrueCollege PtNY40.779743-73.8488801224058914.857468noneparking logt222Single Family Residence1986septic tankPublicdishwasherpropaneWall Unit(s)1.5fiberglass insulationnonenoneEat-in Kitche
9599000House for saleTrueWoodsideNY40.757385-73.8984203324062140.000000noneprivat222Single Family Residence1940public sewerPublicnonenatural gasNone1.5bricknonenonenone
pricestatusadd_attrcitystatelatlongbathbedlivinglot_afireplaceparkingcoveredgaragetotal_spacessubtypeyearsewerwaterappheatingcoolingstoriesmaterialsrooffoundationinterior
13500599000House for saleTrueNashvilleTN36.104990-86.7457603325328712.000000nonedriveway222Single Family Residence1976public sewerPublicelectric water heaterforced airCentral A/2.0bricknonebrick/mortarnone
13501299000House for saleTrueNashvilleTN36.219044-86.7567601396017859.600000nonenone222Single Family Residence1947public sewerPublicoven/range - gashot waterCentral A/3.0brickWood,Otherslabnone
135021149900House for saleTrueNashvilleTN36.193188-86.777830442012871.200000noneoff street222Single Family Residence1936public sewerPublicgas water heaterforced airNone3.0bricknonebrick/mortarDining Are
135033675000House for saleTrueNashvilleTN36.123570-86.79558065570412632.400000noneconcrete drivewa212Single Family Residence1953public sewerPublicbuilt-in microwavecentralCentral A/3.0brickFlatslabDining Are
13504750000House for saleTrueNashvilleTN36.160694-86.8478303322084356.000000nonedrivewa222Single Family Residence1951public sewerPublicgas water heaterforced airCentral A/3.0bricknoneslabnone
135051479000House for saleTrueNashvilleTN36.123120-86.7782105535739601.743659nonegasrage faces fron111Single Family Residence1910private sewerPublicelectric water heatercentralCentral A/3.0othernonebrick/mortarnone
13506575000House for saleTrueNashvilleTN36.204834-86.7386602436589147.600000noneon stree212Single Family Residence1940public sewerPublicdishwasherforced airCentral A/3.0bricknoneslabKitchen - Gourme
13507609000House for saleTrueNashvilleTN36.196106-86.735115331950871.200000noneon street222Single Family Residence1925public sewerPublicgas water heaterhot waterDuctless/Mini-Spli3.0bricknonebrick/mortarCeiling Fan(s)
13508460000Coming soonTrueNashvilleTN36.026558-86.7177102322247840.8000002.0drivewa222Single Family Residence1937public sewerPublicbuilt-in microwaveforced airCentral A/3.0combinationUnknownotherCeiling Fan(s
135094300000House for saleTrueNashvilleTN36.134396-86.82271055451615681.600000noneon street222Single Family Residence1925public sewerPublicgas water heaterhot waterCentral A/3.0brickFlat,Rubberothernone